Large Language Models

# Large Language Models

PaSa

PaSa is an advanced academic paper search agent developed by ByteDance, based on large language model (LLM) technology. It can autonomously invoke search tools, read papers, and filter relevant references to obtain comprehensive and accurate results for complex academic queries. This technology is optimized through reinforcement learning, trained using the synthetic dataset AutoScholarQuery, and has shown outstanding performance on the real-world query dataset RealScholarQuery, significantly outperforming traditional search engines and GPT-based methods. The main advantages of PaSa lie in its high recall and precision rates, providing researchers with a more efficient academic search experience.

self-adaptive-llms

Self Adaptive Llms

SakanaAI/self-adaptive-llms is an adaptive framework called Transformer2, designed to address the challenges of traditional fine-tuning methods, which are computationally intensive and have static capabilities in handling diverse tasks. This framework adjusts large language models (LLMs) in real time during inference using a two-step mechanism: first, a scheduling system identifies task attributes; then, task-specific 'expert' vectors trained via reinforcement learning are dynamically mixed to achieve target behavior for the input prompt. Key advantages include real-time task adaptability, computational efficiency, and flexibility. Developed by the SakanaAI team, this project is open-source on GitHub, currently boasting 195 stars and 12 forks.

Sonus-1

Sonus-1 is a series of large language models (LLMs) launched by Sonus AI, designed to push the boundaries of artificial intelligence. These models are engineered for high performance and versatility across various applications, including versions such as Sonus-1 Mini, Sonus-1 Air, Sonus-1 Pro, and Sonus-1 Pro (with Reasoning) to cater to different needs. The Sonus-1 Pro (with Reasoning) has excelled in multiple benchmarks, especially in reasoning and mathematical tasks, demonstrating its capability to surpass other proprietary models. Sonus AI is committed to developing high-performance, affordable, reliable, and privacy-focused large language models.

FlagEval

FlagEval is a model evaluation platform focused on assessing large language models and multimodal models. It provides a fair and transparent environment for comparing different models under the same standards, helping researchers and developers understand model performance and advancing artificial intelligence technology. The platform covers various model types, including conversational models and visual-language models, supports the evaluation of both open-source and closed-source models, and offers specialized evaluations like K12 subject assessments and financial quantitative trading evaluations.

CosyVoice 2

CosyVoice 2 is a voice synthesis model developed by Alibaba Group's SpeechLab@Tongyi team. It is based on supervised discrete speech labels and combines two popular generative models: language models (LMs) and flow matching, achieving high naturalness, content consistency, and speaker similarity in voice synthesis. This model plays a significant role in multimodal large language models (LLMs), particularly in interactive experiences where response latency and real-time factors are crucial for speech synthesis. CosyVoice 2 enhances the utilization of speech label codebooks through limited scalar quantization, simplifies the text-to-speech language model architecture, and designs a block-aware causal flow matching model to adapt to various synthesis scenarios. It has been trained on large-scale multilingual datasets, achieving human-equivalent synthesis quality with extremely low response latency and real-time performance.

Command R7B

Command R7B is a high-performance, scalable large language model (LLM) introduced by Cohere, specifically designed for enterprise applications. It delivers top-tier speed, efficiency, and quality while maintaining a compact model size, significantly lowering the production deployment costs of AI applications on standard GPUs, edge devices, or even CPUs. Command R7B excels in multilingual support, retrieval-augmented generation (RAG), reasoning, tool usage, and agent behavior, making it ideal for enterprises focusing on optimizing speed, cost efficiency, and computational resources.

MLPerf Client

MLPerf Client is a newly developed benchmark created in collaboration with MLCommons, aimed at evaluating the performance of large language models (LLMs) and other AI workloads on personal computers (from laptops to desktops to workstations). This benchmark simulates real-world AI tasks to provide clear metrics on how systems handle generative AI workloads. The MLPerf Client working group hopes this benchmark will drive innovation and competition, ensuring that personal computers can meet the challenges of an AI-driven future.

Model Training and Deployment

InternVL2_5-38B

Internvl2 5 38B

InternVL 2.5 is a series of multimodal large language models launched by OpenGVLab, featuring significant enhancements in training strategies, testing strategies, and data quality improvements over InternVL 2.0. This series can process image, text, and video data, demonstrating capabilities in multimodal understanding and generation, positioning it at the forefront of the multimodal AI field. The InternVL 2.5 series provides robust support for multimodal tasks with its high performance and open-source attributes.

Sandbox Fusion

Sandbox Fusion is a multifunctional code sandbox specifically designed for large language models (LLMs). It supports up to 20 programming languages and can comprehensively test multiple domains, including programming, mathematics, and hardware programming. Sandbox Fusion integrates over 10 coding-related assessment datasets, which feature standardized data formats and are accessible via a unified HTTP API. Additionally, Sandbox Fusion is optimized for cloud infrastructure deployment and offers built-in security isolation when privileged containers are available. Developed by ByteDance, Sandbox Fusion aims to provide developers with a secure and efficient code testing environment.

Development & Tools

Star-Attention is a novel block-sparse attention mechanism proposed by NVIDIA aimed at improving the inference efficiency of large language models (LLMs) based on Transformers for long sequences. This technology significantly boosts inference speed through a two-stage operation while maintaining an accuracy rate of 95-100%. It is compatible with most Transformer-based LLMs, allowing for direct use without additional training or fine-tuning, and can be combined with other optimization methods such as Flash Attention and KV cache compression techniques to further enhance performance.

Model Training and Deployment

Model Context Protocol Servers

Model Context Protocol Servers

Model Context Protocol Servers is a project that showcases the versatility and scalability of the Model Context Protocol (MCP). It provides a set of reference implementations and community-contributed servers that demonstrate how to use MCP to provide secure, controlled access to tools and data sources for large language models (LLMs). Each MCP server is implemented using the TypeScript MCP SDK or Python MCP SDK. Managed by Anthropic and built with the community, this project is open source and encourages contributions of servers and improvements.

Large Language Models

WorkflowLLM

WorkflowLLM is a data-centric framework designed to enhance the orchestration capabilities of large language models (LLMs). At its core is WorkflowBench, a large-scale supervised fine-tuning dataset containing 106,763 samples from 1,503 APIs across 83 applications and 28 categories. WorkflowLLM fine-tunes the Llama-3.1-8B model to create the WorkflowLlama model optimized specifically for workflow orchestration tasks. Experimental results indicate that WorkflowLlama excels in orchestrating complex workflows and generalizes well to unseen APIs.

Workflow Orchestration

Agora

Agora is a simple cross-platform protocol that allows heterogeneous large language models (LLMs) to communicate effectively with each other through negotiation. The protocol facilitates rare communication in natural language while negotiating a structured data communication protocol (e.g., JSON) for frequent interactions. Once the protocol is established, LLMs will utilize routines—simple scripts (e.g., Python)—for sending or receiving data. Future communications will leverage these routines, reducing dependency on LLMs and enhancing efficiency, versatility, and portability.

Development & Tools

5ire

5ire is an AI product centered on simplicity and user-friendliness, designed to enable even beginners to easily harness large language models. It supports the parsing and vectorization of various document formats and includes features such as a local knowledge base, usage analytics, a prompt library, bookmarks, and quick keyword search. As an open-source project, 5ire is available for free download and also offers a pay-as-you-go API service for large language models.

Knowledge Management

O1-Journey

O1-Journey is a project initiated by the GAIR research group at Shanghai Jiao Tong University, aimed at replicating and reimagining the capabilities of OpenAI's O1 model. This project introduces a novel training paradigm called 'journey learning' and has successfully built the first model that integrates search and learning in mathematical reasoning. Through processes such as trial and error, correction, backtracking, and reflection, this model has become an effective method for tackling complex reasoning tasks.

Research Equipment

URL Parser Online

URL Parser Online

URL Parser Online is an online tool that transforms complex URLs into input formats compatible with large language models (LLMs). The significance of this technology lies in its ability to assist developers and researchers in more effectively handling and parsing URL data, particularly in web content analysis and data extraction tasks. Background information indicates a growing demand for parsing and processing URLs due to the explosive increase in internet data. URL Parser Online provides a convenient solution with its straightforward user interface and efficient parsing capabilities. The service is currently offered for free, targeting developers and data analysts.

Development & Tools

SELA

SELA is an innovative system that enhances automated machine learning (AutoML) by integrating Monte Carlo Tree Search (MCTS) with LLM-based agents. Traditional AutoML methods often produce low-diversity and suboptimal code, limiting their effectiveness in model selection and integration. SELA represents pipeline configurations as trees, enabling agents to intelligently explore the solution space and iteratively refine strategies based on experimental feedback.

Model Training and Deployment

LongVU

LongVU is an innovative long video language understanding model that reduces the number of video annotations through a spatiotemporal adaptive compression mechanism while preserving visual details in lengthy videos. The importance of this technology lies in its ability to handle a large number of video frames while losing only a minimal amount of visual information within a limited context length, significantly enhancing long video content understanding and analysis capabilities. LongVU surpasses existing methods in various video understanding benchmark tests, particularly for tasks involving videos up to one hour long. Furthermore, LongVU can effectively scale down to smaller model sizes while maintaining state-of-the-art video understanding performance.

Model Training and Deployment

FakeShield

FakeShield is a multimodal framework designed to address two primary challenges in the field of Image Forensics Detection and Localization (IFDL): the black-box nature of detection mechanisms and the limited generalization across different tampering methods. By leveraging GPT-4o to enhance existing IFDL datasets, FakeShield has created a Multimodal Tampering Description Dataset (MMTD-Set) to train its tampering analysis capabilities. The framework includes domain label-guided interpretable detection modules (DTE-FDM) and localization modules (MFLM) that can interpret various types of tampering detection and guide localization through detailed textual descriptions. FakeShield outperforms other methods in detection accuracy and F1 scores, providing a superior and interpretable solution.

BitNet

BitNet is an official inference framework developed by Microsoft, designed specifically for 1-bit large language models (LLMs). It provides a set of optimized core features that support fast and lossless 1.58-bit model inference on CPUs (with NPU and GPU support coming soon). BitNet achieves speedups ranging from 1.37x to 5.07x on ARM CPUs, with energy efficiency gains of 55.4% to 70.0%. On x86 CPUs, speed improvements range from 2.37x to 6.17x, and the energy efficiency ratio increases from 71.9% to 82.2%. Additionally, BitNet can run the 100B parameter BitNet b1.58 model on a single CPU, achieving inference speeds close to human reading rates, thus expanding the possibilities of running large language models on local devices.

Model Training and Deployment

awesome-LLM-resources

Awesome LLM Resources

awesome-LLM-resources is a platform that aggregates global resources for large language models (LLMs), offering a range of tools and resources from data acquisition and fine-tuning to inference, evaluation, and real-world applications. Its significance lies in providing researchers and developers with a comprehensive resource library to facilitate the efficient development and optimization of their language models. Maintained by Wang Rongsheng, the platform is continuously updated, providing robust support for the advancement of the LLM field.

AI tools website directory

VirtualWife

VirtualWife is a virtual digital human project aimed at creating a virtual partner with its own 'soul.' The project supports live streaming on Bilibili and is compatible with large language models like OpenAI and Ollama. VirtualWife can provide emotional companionship and serve as a relationship mentor and mental health consultant, fulfilling human emotional needs. The project is currently in the incubation stage, and the author has devoted significant personal time to development, hoping users can support its growth by giving it a star.

AI virtual girlfriend

MM1.5

MM1.5 is a series of multimodal large language models (MLLMs) designed to enhance capabilities in understanding text-rich images, visual reference grounding, and multi-image reasoning. Based on the MM1 architecture, the model adopts a data-centric training approach and systematically explores the impact of different data mixes throughout the model training lifecycle. The MM1.5 model varies from 1B to 30B parameters and includes both dense and mixture of experts (MoE) variants, providing valuable guidance for future MLLM development research through extensive empirical and ablation studies that detail the training processes and decision insights.

AutoDAN-Turbo

AutoDAN-Turbo is an automated framework that operates without human intervention, designed to discover and implement various strategies to circumvent the limitations of large language models (LLMs). The framework can automatically develop diverse attack strategies, significantly increasing the success rate of attacks, and integrates existing human-designed jailbreak strategies into a unified framework. Its significance lies in enhancing the security and reliability of LLMs in adversarial environments, offering a new automated approach for red team assessment tools.

Lumigator

Developed by Mozilla.ai, Lumigator is a product that assists developers in choosing the most appropriate large language model (LLM) for their specific projects. It evaluates models using task-specific metrics, ensuring that the chosen models meet project requirements. Lumigator aims to become an open-source platform that promotes ethical and transparent AI development while addressing gaps in the industry toolchain.

AI Development Aids

Tilores Identity RAG

Tilores Identity RAG

Tilores Identity RAG is a platform providing customer data search, unification, and retrieval services for large language models (LLMs). It uses real-time fuzzy search technology to handle spelling errors and inaccurate information, delivering accurate, relevant, and unified customer data responses. The platform addresses challenges faced by large language models when retrieving structured customer data, such as data being spread across various sources, difficulties in finding customer data due to incomplete matching of search terms, and the complexities involved in unifying customer records. It allows for quick retrieval of structured customer data, the construction of dynamic customer profiles, and provides real-time, unified, and accurate customer data during queries.

AI development assistant

Mishi AI Community

Mishi AI Community

The Mishi AI Community focuses on the intersection of artificial intelligence and product management, providing a comprehensive knowledge system and development use cases related to AI product management. Community members have the opportunity to become 'super individuals and one-person companies.' You can contact the community leaders via email or social media to join the AI PM community.

AI information platform

RD-Agent

RD-Agent is an automated research and development tool launched by Microsoft Research Asia, leveraging the powerful capabilities of large language models to create a new model for AI-driven R&D process automation. By integrating data-driven R&D systems, it harnesses AI capabilities to drive the automation of innovation and development, significantly improving R&D efficiency. With an intelligent decision-making and feedback mechanism, it offers limitless possibilities for future cross-disciplinary innovation and knowledge transfer.

AI Development Aids

NVLM 1.0

NVLM 1.0 is a series of advanced multimodal large language models (LLMs) that have achieved state-of-the-art results on visual-language tasks, comparable to leading proprietary and open-access models. Notably, NVLM 1.0 surpasses its LLM backbone model in text performance following multimodal training. We have made the model weights and code open-source for the community.

OneGen

OneGen is an efficient single-pass generation and retrieval framework designed for large language models (LLMs), intended for fine-tuning generation, retrieval, or mixed tasks. The core idea is to integrate generation and retrieval tasks within the same context by assigning the retrieval task to retrieval tokens generated autoregressively. This enables the LLM to perform both tasks in a single forward pass. This approach not only reduces deployment costs but also significantly decreases inference costs, as it avoids the need for two forward pass computations for queries.

Featured AI Tools

Jules AI

Jules は、自動で煩雑なコーディングタスクを処理し、あなたに核心的なコーディングに時間をかけることを可能にする異步コーディングエージェントです。その主な強みは GitHub との統合で、Pull Request(PR) を自動化し、テストを実行し、クラウド仮想マシン上でコードを検証することで、開発効率を大幅に向上させています。Jules はさまざまな開発者に適しており、特に忙しいチームには効果的にプロジェクトとコードの品質を管理する支援を行います。

開発プログラミング

NoCode

NoCode はプログラミング経験を必要としないプラットフォームで、ユーザーが自然言語でアイデアを表現し、迅速にアプリケーションを生成することが可能です。これにより、開発の障壁を下げ、より多くの人が自身のアイデアを実現できるようになります。このプラットフォームはリアルタイムプレビュー機能とワンクリックデプロイ機能を提供しており、技術的な知識がないユーザーにも非常に使いやすい設計となっています。

開発プラットフォーム

ListenHub

ListenHub は軽量級の AI ポッドキャストジェネレーターであり、中国語と英語に対応しています。最先端の AI 技術を使用し、ユーザーが興味を持つポッドキャストコンテンツを迅速に生成できます。その主な利点には、自然な会話と超高品質な音声効果が含まれており、いつでもどこでも高品質な聴覚体験を楽しむことができます。ListenHub はコンテンツ生成速度を改善するだけでなく、モバイルデバイスにも対応しており、さまざまな場面で使いやすいです。情報取得の高効率なツールとして位置づけられており、幅広いリスナーのニーズに応えています。

腾讯混元画像 2.0

腾讯混元画像 2.0

腾讯混元画像 2.0 は腾讯が最新に発表したAI画像生成モデルで、生成スピードと画質が大幅に向上しました。超高圧縮倍率のエンコード?デコーダーと新しい拡散アーキテクチャを採用しており、画像生成速度はミリ秒級まで到達し、従来の時間のかかる生成を回避することが可能です。また、強化学習アルゴリズムと人間の美的知識の統合により、画像のリアリズムと詳細表現力を向上させ、デザイナー、クリエーターなどの専門ユーザーに適しています。

OpenMemory MCP

OpenMemoryはオープンソースの個人向けメモリレイヤーで、大規模言語モデル（LLM）に私密でポータブルなメモリ管理を提供します。ユーザーはデータに対する完全な制御権を持ち、AIアプリケーションを作成する際も安全性を保つことができます。このプロジェクトはDocker、Python、Node.jsをサポートしており、開発者が個別化されたAI体験を行うのに適しています。また、個人情報を漏らすことなくAIを利用したいユーザーにお勧めします。

オープンソース

FastVLM

FastVLM は、視覚言語モデル向けに設計された効果的な視覚符号化モデルです。イノベーティブな FastViTHD ミックスドビジュアル符号化エンジンを使用することで、高解像度画像の符号化時間と出力されるトークンの数を削減し、モデルのスループットと精度を向上させました。FastVLM の主な位置付けは、開発者が強力な視覚言語処理機能を得られるように支援し、特に迅速なレスポンスが必要なモバイルデバイス上で優れたパフォーマンスを発揮します。

ピカは、ユーザーが自身の創造的なアイデアをアップロードすると、AIがそれに基づいた動画を自動生成する動画制作プラットフォームです。主な機能は、多様なアイデアからの動画生成、プロフェッショナルな動画効果、シンプルで使いやすい操作性です。無料トライアル方式を採用しており、クリエイターや動画愛好家をターゲットとしています。

LiblibAI

LiblibAIは、中国をリードするAI創作プラットフォームです。強力なAI創作能力を提供し、クリエイターの創造性を支援します。プラットフォームは膨大な数の無料AI創作モデルを提供しており、ユーザーは検索してモデルを使用し、画像、テキスト、音声などの創作を行うことができます。また、ユーザーによる独自のAIモデルのトレーニングもサポートしています。幅広いクリエイターユーザーを対象としたプラットフォームとして、創作の機会を平等に提供し、クリエイティブ産業に貢献することで、誰もが創作の喜びを享受できるようにすることを目指しています。

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase